Search CORE

47 research outputs found

Common motifs in scientific workflows: An empirical analysis

Author: Alper P.
Belhajjame K.
Corcho Oscar
Garijo Verdejo Daniel
Gil Yolanda
Goble C.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

While workflow technology has gained momentum in the last decade as a means for specifying and enacting computational experiments in modern science, reusing and repurposing existing workflows to build new scientific experiments is still a daunting task. This is partly due to the difficulty that scientists experience when attempting to understand existing workflows, which contain several data preparation and adaptation steps in addition to the scientifically significant analysis steps. One way to tackle the understandability problem is through providing abstractions that give a high-level view of activities undertaken within workflows. As a first step towards abstractions, we report in this paper on the results of a manual analysis performed over a set of real-world scientific workflows from Taverna and Wings systems. Our analysis has resulted in a set of scientific workflow motifs that outline i) the kinds of data intensive activities that are observed in workflows (data oriented motifs), and ii) the different manners in which activities are implemented within workflows (workflow oriented motifs). These motifs can be useful to inform workflow designers on the good and bad practices for workflow development, to inform the design of automated tools for the generation of workflow abstractions, etc

Base de publications de l'université Paris-Dauphine

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

The University of Manchester - Institutional Repository

Archivo Digital UPM

Common challenges and requirements

Author: K Belhajjame
M Wilkinson
P Buneman
P Martin
R Bordawekar
R Filgueira
T Tanhua
Y Hu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Research infrastructures available for researchers in environmental and Earth science are diverse and highly distributed; dedicated research infrastructures exist for atmospheric science, marine science, solid Earth science, biodiversity research, and more. These infrastructures aggregate and curate key research datasets and provide consolidated data services for a target research community, but they also often overlap in scope and ambition, sharing data sources, sometimes even sites, using similar standards, and ultimately all contributing data that will be essential to addressing the societal challenges that face environmental research today. Thus, while their diversity poses a problem for open science and multidisciplinary research, their commonalities mean that they often face similar technical problems and consequently have common requirements when addressing the implementation of best practices in curation, cataloguing, identification and citation, and other related core topics for data science. In this chapter, we review the requirements gathering performed in the context of the cluster of European environmental and Earth science research infrastructures participating in the ENVRI community, and survey the common challenges identified from that requirements gathering process

Crossref

Online Research @ Cardiff

International Migration, Integration and Social Cohesion online publications

UvA-DARE

ISPIDER Central: an integrated database web-server for proteomics

Author: Belhajjame K.
Côté R.
Embury S.M.
Goble C.A.
Hermjakob H.
Hubbard S.J.
Jones D.T.
Jones P.
Martin Nigel
Oliver S.G.
Orengo C.A.
Paton N.W.
Pentony M.M.
Poulovassilis Alexandra
Selley J.N.
Siepen J.A.
Stevens R.
Zamboulis Lucas
Publication venue: 'Oxford University Press (OUP)'
Publication date: 25/04/2008
Field of study

Despite the growing volumes of proteomic data, integration of the underlying results remains problematic owing to differences in formats, data captured, protein accessions and services available from the individual repositories. To address this, we present the ISPIDER Central Proteomic Database search (http://www.ispider.manchester.ac.uk/cgi-bin/ProteomicSearch.pl), an integration service offering novel search capabilities over leading, mature, proteomic repositories including PRoteomics IDEntifications database (PRIDE), PepSeeker, PeptideAtlas and the Global Proteome Machine. It enables users to search for proteins and peptides that have been characterised in mass spectrometry-based proteomics experiments from different groups, stored in different databases, and view the collated results with specialist viewers/clients. In order to overcome limitations imposed by the great variability in protein accessions used by individual laboratories, the European Bioinformatics Institute's Protein Identifier Cross-Reference (PICR) service is used to resolve accessions from different sequence repositories. Custom-built clients allow users to view peptide/protein identifications in different contexts from multiple experiments and repositories, as well as integration with the Dasty2 client supporting any annotations available from Distributed Annotation System servers. Further information on the protein hits may also be added via external web services able to take a protein as input. This web server offers the first truly integrated access to proteomics repositories and provides a unique service to biologists interested in mass spectrometry-based proteomics

PubMed Central

Birkbeck Institutional Research Online

The Australian National University

The University of Manchester - Institutional Repository

A Federated Design for a Neurobiological Simulation Engine: The CBI Federated Software Architecture

Author: A Davison
A Hodgkin
Allan D. Coop
D Köhn
D Parnas
D Touretzky
DA Marr
E De Schutter
E De Schutter
E De Schutter
E Dijkstra
E Dijkstra
E Law
E Nordlie
E Schwartz
F DeRemer
F Howell
FJ Brooks
H A
H Cornelis
H Cornelis
H Cornelis
Hugo Cornelis
J Crank
J Stiles
James M. Bower
JW Moore
K Belhajjame
KA Robbins
Kelvin E. Jones
L Apostel
L Lapicque
LJ Borg-Graham
M Abramowitz
M Diesmann
M Djurfeldt
M Hereld
M Hines
M Hines
M Hines
M Hines
M Hines
M Hines
M Hucka
M Mernik
M Migliore
M Wilson
M Wilson
M Wilson
P Gleeson
P Gleeson
R Cannon
R Subhasis
RC Cannon
S Hoops
S Sutton Jr
S Sutton Jr
V Stodden
W Eckerson
W Rall
W Rall
Publication venue: Public Library of Science
Publication date: 08/07/2013
Field of study

Simulator interoperability and extensibility has become a growing requirement in computational biology. To address this, we have developed a federated software architecture. It is federated by its union of independent disparate systems under a single cohesive view, provides interoperability through its capability to communicate, execute programs, or transfer data among different independent applications, and supports extensibility by enabling simulator expansion or enhancement without the need for major changes to system infrastructure. Historically, simulator interoperability has relied on development of declarative markup languages such as the neuron modeling language NeuroML, while simulator extension typically occurred through modification of existing functionality. The software architecture we describe here allows for both these approaches. However, it is designed to support alternative paradigms of interoperability and extensibility through the provision of logical relationships and defined application programming interfaces. They allow any appropriately configured component or software application to be incorporated into a simulator. The architecture defines independent functional modules that run stand-alone. They are arranged in logical layers that naturally correspond to the occurrence of high-level data (biological concepts) versus low-level data (numerical values) and distinguish data from control functions. The modular nature of the architecture and its independence from a given technology facilitates communication about similar concepts and functions for both users and developers. It provides several advantages for multiple independent contributions to software development. Importantly, these include: (1) Reduction in complexity of individual simulator components when compared to the complexity of a complete simulator, (2) Documentation of individual components in terms of their inputs and outputs, (3) Easy removal or replacement of unnecessary or obsoleted components, (4) Stand-alone testing of components, and (5) Clear delineation of the development scope of new components

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

A PROV encoding for provenance analysis using deductive rules

Author: Belhajjame K
Missier P
Publication venue: School of Computing Science, University of Newcastle upon Tyne
Publication date: 01/01/2012
Field of study

PROV is a specification, promoted by the World Wide Web consortium, for recording the provenance of web resources. It includes a schema, consistency constraints and inference rules on the schema, and a language for recording provenance facts. In this paper we describe a implementation of PROV that is based on the DLV Datalog engine. We argue that the deductive databases paradigm, which underpins the Datalog model, is a natural choice for expressing at the same time (i) the intensional features of the provenance model, namely its consistency constraints and inference rules, (ii) its extensional features, i.e., sets of provenance facts (called a provenance graph), and (iii) declarative recursive queries on the graph. The deductive and constraint solving capability of DLV can be used to validate a graph against the constraints, and to derive new provenance facts. We provide an encoding of the PROV rules as Datalog rules and constraints, and illustrate the use of deductive capabilities both for queries and for constraint validation, namely to detect inconsistencies in the graphs. The DLV code along with a parser to map the PROV assertion language to Datalog syntax, are publicly available.</p

University of Birmingham Research Portal

Newcastle University E-Prints

Ontologías para modelar la Investigación Científica en Ingeniería Civil

Author: Deelman
K.. Belhajjame.
Publication venue: 'Universidad Tecnologica de Pereira - UTP'
Publication date
Field of study

Crossref

Fine-grained and efficient lineage querying of collection-based workflow provenance

Author: Belhajjame K
Missier P
Paton NW
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date
Field of study

Newcastle University E-Prints

Scalable Saturation of Streaming RDF Triples

Author: Belhajjame K.
Colazzo D.
Farvardin M. A.
Sartiani C.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

In the Big Data era, RDF data are produced in high volumes. While there exist proposals for reasoning over large RDF graphs using big data platforms, there is a dearth of solutions that do so in environments where RDF data are dynamic, and where new instance and schema triples can arrive at any time. In this work, we present the first solution for reasoning over large streams of RDF data using big data platforms. In doing so, we focus on the saturation operation, which seeks to infer implicit RDF triples given RDF Schema or OWL constraints. Indeed, unlike existing solutions which saturate RDF data in bulk, our solution carefully identifies the fragment of the existing (and already saturated) RDF dataset that needs to be considered given the fresh RDF statements delivered by the stream. Thereby, it performs the saturation in an incremental manner. Experimental analysis shows that our solution outperforms existing bulk-based saturation solutions

Archivio della Ricerca - Università della Basilicata

Fostering Scientific Workflow Preservation through Discovery of Substitute Services.

Author: Belhajjame K
Goble CA
Roure DD
Soiland-Reyes S
Publication venue: IEEE Computer Society
Publication date: 01/01/2011
Field of study

Scientific workflows are increasingly gaining momentum as the new paradigm for modeling and enacting scientific experiments. The value of a workflow specification does not end once it is enacted. Indeed, workflow specifications encapsulate knowledge that documents scientific experiments, and are, therefore, worth preserving. Our experience suggests that workflow preservation is frequently hampered by the volatility of the constituent service operations when these operations are supplied by third-party providers. To deal with this issue, we propose a heuristic for locating substitutes that are able to replace unavailable service operations within workflows. The proposed method uses the data links connecting inputs and outputs of service operations in existing workflow specifications to locate operations with parameters compatible with those of the missing operations. Furthermore, it exploits provenance traces collected from past executions of workflows to ensure that candidate substitutes perform tasks similar to those of the missing operations. The effectiveness of the proposed method has been empirically assessed. © 2011 IEEE

Crossref

Oxford University Research Archive

The University of Manchester - Institutional Repository

Streaming saturation for large RDF graphs with dynamic schema information

Author: Belhajjame K.
Colazzo D.
Farvardin M. A.
Sartiani C.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

In the Big Data era, RDF data are produced in high volumes. While there exist proposals for reasoning over large RDF graphs using big data platforms, there is a dearth of solutions that do so in environments where RDF data are dynamic, and where new instance and schema triples can arrive at any time. In this work, we present the first solution for reasoning over large streams of RDF data using big data platforms. In doing so, we focus on the saturation operation, which seeks to infer implicit RDF triples given RDF schema constraints. Indeed, unlike existing solutions which saturate RDF data in bulk, our solution carefully identifies the fragment of the existing (and already saturated) RDF dataset that needs to be considered given the fresh RDF statements delivered by the stream. Thereby, it performs the saturation in an incremental manner. Experimental analysis shows that our solution outperforms existing bulk-based saturation solutions

Crossref

Archivio della Ricerca - Università della Basilicata